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We investigate the time-varying ARCH (tvARCH) process. It is 
shown that it can be used to describe the slow decay of the sample 
autocorrelations of the squared returns often observed in financial 
time series, which warrants the further study of parameter estimation 
methods for the model. 

Since the parameters are changing over time, a successful estima- 
tor needs to perform well for small samples. We propose a kernel 
normalized-least-squares (kernel-NLS) estimator which has a closed 
form, and thus outperforms the previously proposed kernel quasi- 
maximum likelihood (kernel-QML) estimator for small samples. The 
kernel-NLS estimator is simple, works under mild moment assump- 
tions and avoids some of the parameter space restrictions imposed 
by the kernel-QML estimator. Theoretical evidence shows that the 
kernel-NLS estimator has the same rate of convergence as the kernel- 
QML estimator. Due to the kernel-NLS estimator's ease of computa- 
tion, computationally intensive procedures can be used. A prediction- 
based cross-validation method is proposed for selecting the band- 
width of the kernel-NLS estimator. Also, we use a residual-based 
bootstrap scheme to bootstrap the tvARCH process. The bootstrap 
sample is used to obtain pointwise confidence intervals for the kernel- 
NLS estimator. It is shown that distributions of the estimator using 
the bootstrap and the "true" tvARCH estimator asymptotically co- 
incide. 

We illustrate our estimation method on a variety of currency ex- 
change and stock index data for which we obtain both good fits to 
the data and accurate forecasts. 
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1. Introduction. Among models for log-returns Xf = log(Pf/Pj_i) on 
speculative prices Pt (such as currency exchange rates, share prices, stock 
indices, etc.), the stationary ARCH(p) [Engle (1982)] and GARCH(p, q) 
[Bollerslev (1986) and Taylor (1986)] processes have gained particular pop- 
ularity and have become standard in the financial econometrics literature 
as they model well the volatility of financial markets over short periods of 
time. For a review of recent advances on those and related models, we refer 
the reader to Fan and Yao (2003) and Giraitis, Leipus and Surgailis (2005). 

The modeling of financial data using nonstationary time series models 
has recently attracted considerable attention. Arguments for using such 
models were laid out, for example, in Fan, Jiang, Zhang and Zhou (2003), 
Mikosch and Starica (2000, 2003, 2004), Mercurio and Spokoiny (2004a, 
2004b), Starica and Granger (2005) and Fryzlewicz et al. (2006). 

Recently, Dahlhaus and Subba Rao (2006) generalized the class of ARCH(p) 
processes to include processes whose parameters were allowed to change 
"slowly" through time. The resulting model, called the time- varying ARCH(p) 
[tvARCH(p)] process, is defined as 



for t = 1, 2, . . . , A^, where {Zt}t are independent and identically distributed 
random var iables with K{Zt) = and K{Z^) = 1. In this paper, we focus 
on how the tvARCH(p) process can be used to characterize some of the 
features present in financial data, estimation methods for small samples, 
bootstrapping the tvARCH(p) process and the fitting of the tvARCII(p) 
process to data. 

In Section 2, we show how the tvARCH(p) process can be used to de- 
scribe the slow decay of the sample autocorrelations of the squared returns 
often observed in financial log-returns and usually attributed to the long 
memory of the underlying process. This is despite the true nonstationary 
correlations decaying geometrically fast to zero. Thus, the tvARCH(p) pro- 
cess, due to its nonstationarity, captures the appearance of long memory 
which is present in many financial datasets: a feature also exhibited by a 
short memory G ARCH (1,1) process with structural breaks [Mikosch and 
Starica (2000, 2003, 2004) — note that this effect goes back to Bhattacharya, 
Gupta and Waymire (1983)]. 

The benchmark method for the estimation of stationary ARCH(p) pa- 
rameters is the quasi-maximum likelihood (QML) estimator. Motivated by 
this, Dahlhaus and Subba Rao (2006) use a localized kernel-based quasi- 
maximum likelihood (kernel-QML) method for estimating the parameters of 
a tvARCH(p) process. However, the kernel-QML estimator for small sam- 
ple sizes is not very reliable, since the QML tends to be shallow about the 
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minimum for small sample sizes [Shephard (1996) and Bose and Mukher- 
jee (2003)]. This is of particular relevance to tvARCH(p) processes, where 
in regions of nonstationarity, we need to base our estimator on only a few 
observations to avoid a large bias. Furthermore, the parameter space of 
the estimator is restricted to infj aj(u) > 0. However, it is suggested in the 
examples in Section 6 that over large periods of time some of the higher- 
order parameters should be zero. This renders the assumption infj aj{u) > 
rather unrealistic. In addition, evaluation of the kernel-QML estimator at 
every time point is computationally quite intensive. Therefore, bandwidth 
selection based on a data driven procedure, where the kernel-QML estimator 
has to be evaluated at each time point for different bandwidths, may not be 
feasible for even moderately large sample sizes. 

A rival class of estimators are least-squares-based and are known to have 
good small-sample properties [Bose and Mukherjee (2003)]. These types of 
estimators will be the focal point in this paper. In Section 3 and the fol- 
lowing sections, we propose and thoroughly analyze a (suitably localized 
and normalized) least-squares-type estimator for the tvARCH(p) process 
which, unlike the kernel-QML estimator mentioned above, enjoys the fol- 
lowing properties: (i) very good performance for small samples, (ii) simplic- 
ity and closed form and (iii) rapid computability. In addition, it does allow 
infj aj(u) = 0, thereby avoiding the parameter space restriction described 
above. 

In Section 3.1, we consider a general class of localized weighted least- 
squares estimators for tvARCH(p) process and study their sampling prop- 
erties. We show that their small sample performance, sampling properties 
and moment assumptions depend on the weight function used. 

In Section 3.3, we investigate weight functions that lead to estimators 
which are close to the kernel-QML estimator for large samples and easy to 
compute. In fact, we show that the weight functions which have the most de- 
sirable properties contain unknown parameters. This motivates us in Section 
3.4 to propose the two-stage kernel normalized-least-squares (kernel-NLS) 
estimator where in the first stage we estimate the weight function which we 
use in the second stage as the weight in the least-squares estimator. The 
two-stage kernel-NLS estimator has the same sampling properties as if the 
true weight function were a priori known, and has the same rate of con- 
vergence as the kernel-QML estimator. In Section 3.6, we state some of the 
results from extensive simulation studies which show that for small sample 
sizes the two-stage kernel-NLS estimator performs better than the kernel- 
QML estimator. This suggests that at least in the nonstationary setup, the 
two-stage kernel-NLS estimator is a viable alternative to the kernel-QML 
estimator. 

In Section 4, we propose a cross-validation method for selecting the band- 
width of the two-stage kernel-NLS estimator. The proposed cross-validation 
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procedure for tvARCH(p) processes is based on one-step-ahead prediction of 
the data to select the bandwidth. The closed form solution of the two-stage 
kernel-NLS estimator means that, for every bandwidth, the estimator can be 
evaluated rapidly. The computation ease of the two-stage kernel-NLS esti- 
mator means that it is simple to implement a cross-validation method based 
on this scheme. We discuss some of the implementation issues associated 
with the procedure and show that its computational complexity remains 
low. 

In Section 5, we bootstrap the tvARCH(p) process. This allows us to 
obtain finite sample pointwise confidence intervals for the tvARCH(p) pa- 
rameter estimators. The scheme is based on bootstrapping the estimated 
residuals, which we use, together with the estimated tvARCH(p) parameters, 
to construct the bootstrap sample. Again, the fact that the bootstrapping 
scheme is computationally feasible is only due to the rapid computability of 
the two-stage kernel-NLS estimator. We show that the distribution of the 
bootstrap tvARCH(p) estimator asymptotically coincides with the "true" 
tvARCH(p) estimator. The method and results in this section may also be 
of independent interest. 

In Section 6, we demonstrate that our estimation methodology gives a 
very good fit to data for the USD / GBP currency exchange and FTSE stock 
index datasets, and we also exhibit bootstrap pointwise confidence intervals 
for the estimated parameters. In Section 7, we test the long-term volatil- 
ity forecasting ability of the tvARCH(p) process with p = 0, 1, 2, where the 
parameters are estimated via the two-stage kernel-NLS estimator. We show 
that, for a variety of currency exchange datasets, our forecasting method- 
ology outperforms the stationary GARCII(1, 1) and EGARCH(1, 1) tech- 
niques. However, it is interesting to observe that the latter two methods give 
slightly superior results for a selection of stock index datasets. 

Proofs of the results in the paper are outlined in the Appendix. Further 
details of the proofs can be found in the accompanying technical report, 
available from the authors or from http://www.maths.bris.ac.uk/ mapzf/ 
tvarch/trNLS.pdf. 

2. The tvARCH(p) process: preliminary results and motivation. In 

this section, we discuss some of the properties of the tvARCH(p) process. 

2.1. Notation, assumptions and main ingredients. We first state the as- 
sumptions used throughout the paper. 

Assumption 1. Suppose {Xt^N}t is a tvARCH(p) process. We assume 
that the time-varying parameters {aj{u)}j and the innovations {Zt}t satisfy 
the following conditions: 
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(i) There exist < pi < p2 < oo and < 5 <l such that, for all n G (0,1], 
Pi < ao{u) < p2, and sup„Ej=i aj{u) <l-5. 

(ii) There exist P S (0, 1] and a finite constant K > such that for u,v £ 
(0,1] 

\aj{u) — aj{v)\ < K\u — for each j = 0, l,...,p. 

(iii) For some 7 > 0, E{\Zt\^(^+^'^) < 00. 

(iv) For some 77 > and < 6 <1, mi+^ sup„ J2^=i '^j {u) <1 — 5, where 
mi+^ = {E(|Zt|2a+^))}V(i+'?). 

Assumption l(i) implies that sup^^E(X^^^) < co. Assumption l(i), (ii) 
means that the tvARCH(p) process can locally be approximated by a sta- 
tionary process. We require Assumption l(iii), (iv) to show asymptotic nor- 
mality of the two-stage kernel-NLS estimator (defined in Section 3.4). Com- 
paring mi+^ sup„ X]j=i ^i(^) < 1 — 5 with the assumption required to show 
asymptotic normality of the kernel-QML estimator (mi sup„ X]j=i Oi('") ^ 
1 — 5, where we note that mi = 1), it is only a mildly stronger assumption, 
as we only require it to hold for some rj > 0. In other words, if the moment 
function rui, increases smoothly with z/, and mi sup„ Sj=i (i*) < 1 — S, then 
there exists a 77 > and < Si <1 such that mi+,, sup„ Z)j=i '^j(^) < 1 — 
[which satisfies Assumption l(iv)]. 

In order to prove results concerning the tvARCH(p) process, Dahlhaus 
and Subba Rao (2006) define the stationary process {Xt{u)}t- Let u G (0, 1] 
and suppose that, for each fixed u, {Xt{u)}t satisfies the model 

p 

(2) Xt{u) = at{u)Zt, af{u) = ao{u) + ^aj{u)Xt_j{u). 

i=i 

The following lemma is a special case of Corollary 4.2 in Subba Rao 
(2006), where it was shown that {X^{u)}t can be regarded as a stationary 
approximation of the nonstationary process {X^j^}t about u ~ t/N, which 
is why {Xt^N}t can be regarded as a locally stationary process. We can treat 
the lemma below as the stochastic version of Holder continuity. 

Lemma 1. Suppose {Xt^N}t is a tvARCH(p) process which satisfies As- 
sumption l(i), (ii), and let {Xt{u)}t be defined as in (2). Then, for each 
fixed u G (0,1], we have that {Xi{u)}t is a stationary, ergodic process such 
that 



(3) \xlj,-XKu)\<^Vt,N + 



t 

N 



/3 

Wt almost surely, 



and \Xf (u) — Xf {v)\ < \u — v\^Wt, almost surely, where {Vt^N}t and{Wt}t 
are well-defined positive processes, and {Wt}t is a stationary process. In ad- 
dition, if we assume that Assumption l(iv) holds, then we have sup^ ^K\Vt^N\^~ 
00 and EIWA^^"^ < 00. 
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Several of the estimators considered in this paper [e.g., the estimators 
defined in (4) and (7), etc.] are local or global averages of functions of the 
tvARCH(p) process. Unlike stationary ARCH(p) (or more general station- 
ary) processes, we cannot study the sampling properties of these estimators 
by simply letting the sample size grow. Instead, we use the rescaling by N to 
obtain a meaningful asymptotic theory. The underlying principle to study- 
ing an estimator at a particular time t, is to keep the ratio t/N fixed and 
let — > oo [Dahlhaus (1997)]. However, the tvARCH(p) process varies for 
different N, which is the reason for introducing the stationary approxima- 
tion. 

Throughout the paper, — > and — > denote convergence in probability and 
in distribution, respectively. 

2.2. The covariance structure and the long memory effect. The following 
proposition shows the behavior of the true autocovariance function of the 
squares of a tvARCH(p) process. 

Proposition 1. Suppose {Xt^N}t is a tvARCH(p) process which sat- 
isfies Assumption l(i), (ii), and assume that {1E(Z^^)}^/^ sup„X]j=i Oj(ii) < 
1 — S, for some < 6 < 1. Then, for some p £ {1 — 6,1) and a fixed h>0, 
we have 

sup I coy{xI^, Xf_^^^}\ < Kp^, 

t,N 

for some finite constant K > that is independent of h. 

If the fourth moment of the process {Xt^N^t exists, then Proposition 1 
implies that {X^ j^}t is a short memory process. 

However, we now show that the sample autocovariance of the process 
{X^jyIj, computed under the wrong premise of stationarity, does not neces- 
sarily decay to zero. Typically, if we believed that the process {X^ j^}t were 
stationary, we would use S]\fih) as an estimator of cov{X^ j^, X^j^/^ ^}, where 

N~h 

(4) SN{h) = xInXIh,n - {Xn? 

and 

N-h 

Denote p,{u) = ¥.{Xf{u)) and c{u,h) = cov{Xf{u),X^^i^{u)} for each u& 
(0, 1] and h>0. 
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The following proposition shows the behavior of the sample autocovari- 
ance of the squares of a tvARCH(p) process, evaluated under the wrong 
assumption of stationarity. 

Proposition 2. Suppose {Xt^N}t is a tvARCH(p) process which satis- 
fies Assumption l(i), (ii), and assume that, for some < C < 2 and < 6 < 
1, {E(|Ztp(2+C))}i/(2+C)sup^^P^^a^.(^) <i-s. Then, for fixed h>0, as 
N ^ oo, we have 

(5) Sjsfih) / c{u,h)du+ / / {fi^u) — fi{v)}^ du dv . 

Jo J J{0<n<-u<l} 

According to Proposition 2, since the autocovariance of the squares of 
a tvARCH(p) process decays to zero exponentially fast as oo, so does 
the first integral in (5). However, the appearance of persistent correlations 
would still appear if the second integral were nonzero. We consider the simple 
example when the mean of the squares increases linearly, that is, if fi{u) = cu, 
for some nonzero constant c. In this case, the second integral in (5) reduces 
to c'^/12. In other words, the long memory effect is due to changes in the 
unconditional variance of the tvARCH(p) process. 

3. The kernel-NLS estimator and its asymptotic properties. Typically, 
to estimate the parameters of a stationary ARCH(p) process, a QML esti- 
mator is used, where the likelihood is constructed as if the innovations were 
Gaussian. The main advantage of the QML estimator is that, even in the case 
that the innovations are non-Gaussian, it is consistent and asymptotically 
normal. In contrast, Straumann (2005) has shown that under misspecifica- 
tion of the innovation distribution, the resulting non-Gaussian maximum 
likelihood estimator is inconsistent. As it is almost impossible to specify the 
distribution of the innovations, this makes the QML estimator the bench- 
mark method when estimating stationary ARCII(p) parameters. 

A localized version of the QML estimator is used to estimate the pa- 
rameters of a tvARCH(p) process in Dahlhaus and Subba Rao (2006). To 
prove the sampling results, the asymptotics are done in the rescaled time 
framework. In practice, a good estimator is obtained if the process is close 
to stationary over a relatively large region. However, the story is completely 
different over much shorter regions. As noted in the Section 1, in estimation 
over a short period of time (which will often be the case for nonstationary 
processes), the performance of the QML estimator is quite poor. 

Rival methods are least-squares-type estimators which are known to have 
good small sample properties. In this section, we focus on kernel weighted 
least-squares as a method for estimating the parameters of a tvARCH(p) 
process. To this end, we define the kernel : [— 1/2, 1/2] — > M, which is a 
function of bounded variation and satisfies the standard conditions: 
^(^) = 1 and J^{\ W^{x) dx < oo. 
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3.1. Kernel weighted least- squares for tv ARCH {p) processes. It is straight- 
forward to show that the squares of the tvARCH(p) process satisfy the 
autoregressive representation X^j^ = ao{jf) + J^j^iaj{j^)X^_j ^ + {Zf — 
jy. For reasons that wih become obvious later, we weight the least 
squares representation with the weight function K{uQ,Xk-i^]^), where ^ = 

(1, • • • ' -^k-p Af)i ^^'^ define the fohowing weighted least-squares cri- 

terion: 

" 1 r.Jk-kUXl^-C.O-T.Wl^l 



(6) A..«fe)= E I^*^ 

k=p+l ^ ' 



'k-j,NJ 



K(tio, Afc_i^7v) 



U\uQ — tQ/N\ < 1/N, we use a^^ as an estimator of a(uo) = {ao{u),ai{u), . . . , 
ap{u)Y' , where 

(7) Ofg^^f = argmin£to,Ar(a). 

a 

Since Qh^^n is a least-squares estimator, it has the advantage of a closed form 
solution, that is, Qh^^n — {'^to,N}~^Lto,N^ where 



k=p+l ^ ^ 



N 



1 ^j^f to — k\ X'^ j^Xk-l^N 



k=p 

3.2. Asymptotic properties of the kernel weighted least-squares estimator. 
We now obtain the asymptotic sampling properties of a^^ ^ . 

To show asymptotic normality we require the following definitions: 

(oj Ak{Uj = = , L>k\U) = = 

and 

TV 



(9) 



k=p+l 

'{^k,N - "0 - Ej=l «i^l-j,Ar}^ 



n{uQ,Xk-i^NY 

{Xl{uo)-a,-Y.Uo.,XlAu,)Y 



K{uo,Xk-l{u)Y 

where Xt-i{u) = {\,X^_-^{u), . . . ,Xl_p{u)). We point out that if {Xt^N}t 
were a stationary process then BtQ^N^QL) = 0. 
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In the following proposition we obtain consistency and asymptotic nor- 
mality of fltQ^AT. We denote V/(u,a) = ( ^'a^o"^ ' • • • ' ^^da'f^ ' ^^"^ — ~ 
{l,xi,X2,...,Xp) andy = {l,yi,y2,...,yp). 

Proposition 3. Suppose {Xt^N}t is a tvARCH(p) process which satis- 
fies Assumption l(i), (ii), (iii), and let a-f.^]si, At{u), Vt{u) and Btg^Nin) 
be defined as in (7), (8) and (9), respectively. We further assume that 
K is bounded away from zero and we have a type of Lipschitz condition 
on the weighted least-squares; that is, for all 1 < i <p, \ ^^^-^ — ^(u^y) I — 
KJ2^=i — yj\, for some finite constant K > 0. Also, assume for all 1 < 

i <p that supfc^jv K(no '^k 'lN)'-^ ^ ^ ^^'^ suppose \uq — to/-^| < 1/-/V- 

■p 

(i) Then we have Ofg^Tv ~^ 0(^0); with 6 — > 0, bN -^00 as N ^ 00. 

(ii) // in addition we assume for all 1 < i < p and some u > that 

s'^Pk,N^{ ^(uo,xT-i^N)^+-' ^ < ^^^'^ ^"^^ Vi3t,j,Ar(a(uo)) =Op{bf^) and 
VbN{at^^N - a{uo)) + ^VbNE[At{uo)]~^VBt,,N{a{uo)) 

(10) 

^M{0,W2fiMMuo)]"^nT^t{uo)]E[At{uo)]-^), 

with 6 — > 0, bN -^00 as N ^ 00, where W2 = J^y2^'^i^) ^'^'^ ~ 
var(Z2). 

At first glance the above assumptions may appear quite technical, but we 
note that in the case k{-) = 1, they are standard in least-squares estimation. 
Furthermore, if the weight function k is bounded away from zero and Lip- 
schitz continuous [i.e., sup^, ^ \k{u,x) — K{u,y)\ < KJ2^=i \xj — yj\, for some 
finite constant K > 0], then it is straightforward to see that I ^ I < 

J' ° ' k(u,x) K(u,y)' — 

KY^^=i \xj — yj\ - In the following section, we will suggest a k(-) that is ideal 
for tvARCH(p) estimation and satisfies the required conditions. 

3.3. Choice of weight function k. By considering both theoretical and 
empirical evidence, we now investigate various choices of weight functions. 
To do this, we study Proposition 3 and consider the k which yields an es- 
timator which requires only weak moment assumptions and has minimal 
error [see (10)]. Considering first the bias in (10), if VbNb^ — > 0, then the 
bias converges in probability to zero. Instead we focus attention on (i) the 
variance E[^t(iio)]~^E[Pt (uo)]E[^t(uo)]~'^ and (ii) derivation under low mo- 
ment assumptions. 

In the stationary ARCH framework, Giraitis and Robinson (2001), Bose 
and Mukherjee (2003), Horvath and Liese (2004) and Ling (2007) have con- 
sidered the weighted least-squares estimator for different weight functions. 
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Giraitis and Robinson (2001) use the Whittle hkehhood to estimate the 
parameters of a stationary ARCH(oo) process. Adapted to the nonstation- 
ary setting, the local Whittle likelihood estimator and the local weighted 
least-squares estimator are asymptotically equivalent when k{-) = 1. Study- 
ing their assumptions, sup^ ^E(X^''jy) < and sup^ j^E{Xf~^'^) < oo, for 
some > 0, are required to show consistency and asymptotic normality. 
Assuming normality of the innovations {Zt}t and interpreting these condi- 
tions in terms of the coefficients of the tvARCH(p) process, they imply that 
supu X)j=i < l/\/3 is required for consistency and sup^J2^=iO-j{'^) < 
l/{E(Zf+^'')}^/(^+'') for asymptotic normality. In other words, the tvARCH(p) 
process should be close to a white noise process for the sampling results to 
be valid. 

On the other hand, Bose and Mukherjee (2003) use a two-stage least- 
squares procedure to estimate the stationary ARCH(p) parameters. In the 
first stage, they use least-squares with weight function k(-) = 1 and in the 
second stage — a least-squares estimator with k = where is an esti- 
mator of the conditional variance. An advantage of their scheme is that, 
asymptotically, it has the same distribution variance as the QML estima- 
tor. However, because in the first stage they use the weight k(-) = 1, their 
method requires the same set of assumptions as in Giraitis and Robinson 
(2001). 

To reduce the high moment restrictions, Horvath and Liese (2004) use 
random weights of the form K{u^Xk-i^N) = 1 +Yfj=i^'k~j n ^° estimate 
stationary ARCII(p) parameters, and Ling (2007) uses a similar weighting to 
estimate the parameters of a stationary ARMA-GARCH process. The main 
advantage of using this choice of weight functions is that under Assumption 
l(i), (ii), (iii) the moment assumptions in Proposition 3 are satisfied. 

Motivated by the discussion above, let us consider weight functions which 
have the form k,{u, X^-i^n) = g{u) + Y^^j=i Pj{u)X'^_ - ^. We will make some 
comparisons with the kernel-QML estimator considered in Dahlhaus and 
Subba Rao (2006), who showed that the kernel-QML estimator is asymp- 
totically normal with variance i(;2/^4lE[Si(ito)]~^, where 

It is worth noting that if {pj{u)} are bounded away from zero, then the 
conditions in Proposition 3 are fulfilled with no additional assumptions. 
For the purposes of this discussion only, let us assume for a moment that 
infj aj{u) > (although this is not a requirement for our estimation method- 
ology to be valid). In order to select g{-) and Pj{-)-, we first observe that if 
a(uo) were known then letting k{uq, A'^.i^^r) = ao(tto) + Sj=i (^j{'^o)^k-i,N 
would be the ideal choice [provided mlj aj{uQ) > 0] as the asymptotic vari- 
ance of the resulting kernel weighted least-squares estimator would be the 
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same as the kernel-QML estimator. Clearly this weight function is unknown, 
and for this reason we call it the "oracle" weight. Instead, we look for a 
closely related alternative, which is computationally simple to evaluate and 
avoids the requirement that inflow (uq) > 0. Let us consider a weight func- 
tion k{u, Xk-i,N) = 9{u) + Z]j=i -^k-j N [which is in the spirit of the solution 
proposed by Horvath and Liese (2004) for stationary ARCH(p) processes] 
and compare it to the oracle weight. For convenience, the estimator using 
the weight function g{u) + J2^=i Xk-j,N ^^^^ ^-estimator, and the 
estimator using the oracle weight we call the oracle estimator. 

Using Proposition 3, we see that the asymptotic distribution variance of 

the (^-estimator and the oracle estimator is W2fJ'iE,[A[^\u)]~^K['Dl.^\u)] x 
E[^^^^(n)]~"'^ and u;2/^4lE[S((uo)]~^, respectively, where 



(12) 



and Sj(ti) is defined in (11). Let a(ti) = X]^=i ^j(^)) /5(^) = l/min^=i aj(^) 
and l^ldet denote the determinant of a matrix. By bounding A^i\u) and 
v'f^ (n) from both above and below we obtain 



dot - I^L-^t V-VJ ^L-^t V^-yj^L-^t Idct 

<w{gf\n^t{u)]\-^,, 
ao{u) + g{u) a{u) \ f g{u) + (3{u)ao{u) 



(13) 
where 

g(u) J V ao(u) 

Examining (13), we have an upper and lower bound for the asymptotic 
distribution variance of the g'-estimator in terms of the asymptotic vari- 
ance of the oracle estimator. It is easily seen that the difference {w{g)^ — 
w{g)~'^)\E^^t{'u)]\^lt upper bound tz7((7)^|E[St(u)]| jjj. are minimized 

when(/*(n) = (ao(tt))/([mini<j<p aj(ii)] X]j=i Oj('"))- However, g*{u) depends 
on unknown parameters and is highly sensitive to small values of aj{u), hence 
it is inappropriate as a weight function. Instead, we consider a close relative 
g{u) := iJ,{u) = ao{u)/{l — a{u)), where fi{u) = E,[Xf{u)]. In this case, us- 
ing (13), we obtain the following upper and lower bound for the asymptotic 
variance of the kernel weighted least-squares estimator in terms of the oracle 
variance: 

|E[S,(n)]|jX^)-i < \E[Al^\ur'E[v'i^\u)]E[Al^\ur\^, 

(14) 
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where 

f l + (5{u)[l-a{u)] 
V 1 — a[u) 

We notice that the upper and lower bounds in (14) do not depend on the 
magnitude of ao(u). 

Since j^*^"^"^^ = W.{X^{u)) = fi{u), which is the local mean, it can easily be 
estimated from {X^^n}- In the following section, we use it to estimate the 
weight function K{uo,Xk-i,N) = ^J'{uo) + Sk-i,N, where Sk-i,N = E^=i ^k-j,N- 
An additional advantage of this weight function, K(no, ^^fc-i.Af)) is that under 

Assumption 1, supk,NH^(^lJi;^) < °° supfc,;vE(^— ^^^^j^+f) < 
DO are immediately satisfied. Furthermore, \k{u,x) — K{u,y)\ < KJ2^=i l^j ~ 
Vjl, thus \xi/Kiu,x) - yi/K{u,y)\ <KYl'j =1 l^i ~ yjl- Therefore, all the con- 
ditions in Proposition 3 hold. 

3.4. The two-stage kernel-NLS estimator. We use fito,N as an estimator 
of /i(uo) (see Lemma A.l in the Appendix), where 



N , . 

(15) ^to,N = T.m^( 



, bN" \ bN 

k=l 



y2 



We use this to define the two-stage kernel-NLS estimator of the tvARCH(p) 
parameters. 

The two-stage scheme: 

(i) Evaluate fito,N, given in (15), which is an estimator of /u(uo). 

(ii) Let a^^^j^ = {TZtoM'^Iito^N with Sk-i,N = Ej=i Xk-j,N^ Kto,N{Sk-i,N) 
{fJ'to,N + Sk-i,N) and 

N 



(16) 



k=p+l 



bN V bN J Kt,,N{Sk-i,N?' 

bN \ bN ) Kt,,N{Sk-l,N) 



k=p+l 



If \uo — to/N\ < 1/N, we use df-^ ^ as an estimator of a{uo). We cah d^^ ^ 
the two-stage kernel-NLS estimator. 

3.5. Asymptotic properties of the two-stage kernel-NLS estimator. We 
derive the asymptotic sampling properties of a^.^ ^. [We note that because 
in the first stage we need to estimate the weight function k{uo, X^-i) = 
IJ-iuo) +5fc_i^7V) we require the additional mild Assumption l(iv), which we 
use to obtain a rate of convergence for \fHo,N — fJ'{uo)\.] 
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In the following proposition we obtain consistency and asymptotic nor- 
mality of a^g^^. 

Proposition 4. Suppose {Xt^N}t is a tvARCH(p) process which sat- 
isfies Assumption l(i), (ii), and let fHo,N,Qn^^N^ 

aI^\u) andv[^\u) he de- 
fined as in (15), the two stage scheme and (12), respectively. Further, let 
fi{u) = E{Xf{u)), and suppose \uo — to/N\ < 1/A^. 

■p 

(i) Then we have Of^^ ^ (l{uo), with 6^0, bN — > oo as N ^ oo. 

(ii) // in addition we assume that Assumption i(iii), (iv) holds, then we 



have 
(17) 



/WV(a,„,^ - a(no)) + '^VbN{E[4^\uo)]}-'VB,^^^{a{uo)) 

where VB^^ ]^{a{uo)) = Op{b^) and W2 and fi4 are defined as in Proposi- 
tion 3, with b ^0, bN — > oo as N ^ oo. 

Comparing the two-stage kernel-NLS estimator with the kernel-QML esti- 
mator in Dahlhaus and Subba Rao (2006), it is easily seen that they both 
have the same rate of convergence. 

Remark 1 (An asymptotically optimal estimator). We recall that the 
oracle estimator asymptotically has the same variance as the kernel-QML 
estimator, but in practice the oracle weight is never known. However, the 
two-stage kernel-NLS estimator can be used as the basis of an estimate of the 
oracle weight. In other words, using the two-stage kernel-NLS estimator, we 
define the weight function alj^{uo) = ato,N{0) +Tlj=i ato,NU)^k-j,N^ where 
3^to,N = {o.tQ,N{0), . . . , atQ,N{p))- Then, we use a^^ ^ as an estimator of a{uo), 
where a^^ ^ = {'^tQ,N}~^Lto,Nj ^^'^ '^t^.N and Lt^^N ai'^ defined in the same 
way as TZiq^n and f^^^^, with a^'^^^iuo) replacing {ijito,N + YJj=iXf_jj^). The 
asymptotic sampling results can be derived using a similar proof to Propo- 
sition 4. More precisely, if Assumption 1 holds, b^VbN — > 0, and aj{uo) > 
for all j, then we have 

(18) \^(ato,7V - «(^^o)) ^ AA(0, W2fi4{E[^t{uo)]}~^). 

In other words, by using the two-stage kernel-NLS estimator, we are able 
to estimate the oracle weight sufficiently well for the parameter to have 
the same asymptotic variance as the kernel-QML estimator. We note that, 
similarly to the kernel-QML estimator, we require that infj aj(n) > 0. How- 
ever, it is suggested in the examples in Section 6 that over large periods 
of time some of the higher-order parameters should be zero. This renders 
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Table 1 

Ratios of Mean Absolute Errors of two-stage NLS and QML estimators, averaged over 
100 simulated sample paths, for stationary ARCH(2) estimation with Gaussian errors Zt 





and (ao, ai. 


,02) = (1,0.6,0.3). Sample 


sizes vary from 


N = 15 to N = 


250 




N = 15 


AT = 30 


N = 60 


N = 100 


N = 150 


N = 250 


ao 


0.59 


0.69 


0.91 


1.04 


0.96 


1.28 


ai 


0.84 


0.73 


0.97 


0.97 


1.10 


1.11 


02 


0.64 


0.68 


0.86 


0.98 


0.94 


1.08 



the assumption infjaj(n) > rather unreahstic. Furthermore, to estimate 
^to,Nj require an additional stage of computation, which significantly in- 
creases computation time in tasks such as cross- validatory bandwidth choice 
or evaluation of bootstrap confidence intervals. Also, small sample evidence 
suggests that the performance of the estimators a^^j ^to N is similar. 

For this reason, in the rest of this paper, we focus on a^p^Ar, though our 
results can be generalized to a^^^N- 

3.6. Comparison of two-stage kernel-NLS and kernel-QML estimators for 
small samples. As mentioned earlier, in a nonstationary setting, it is essen- 
tial for any estimator of tvARCH(p) parameters to perform well for small 
sample sizes. We now briefly describe the outcome of an extensive simula- 
tion study aimed at comparing the performance of the two-stage NLS and 
QML estimators on short stretches of stationary ARCH(2) data. We have 
tested the two estimators for Gaussian, Laplace and Student-t errors Zt, 
and for various points of the parameter space (00,01,02). The two-stage 
NLS estimator significantly outperformed the QML estimator for very small 
sample sizes in almost all of the cases. More complicated patterns emerged 
for sample sizes of about 150 and larger, where the performance depended 
on the particular point of the parameter space. However, the two-stage NLS 
estimator was never found to perform much worse than the QML estimator. 
We also found the two-stage NLS estimator to be significantly faster than 
the QML estimator as it did not involve an iterative optimization procedure. 

As an example. Table 1 shows the ratios of the mean absolute errors of the 
two-stage NLS and QML estimators, averaged over 100 simulated sample 
paths, for the following parameter configuration: (00,01,02) = (1,0.6,0.3). 
The errors Zt are Gaussian. The above point of the parameter space is 
"typical" in the sense that it lies in the interior of the parameter space 
(and thus is suitable for QML estimation which requires 01,02 > 0) and 
that oi > 02 as expected in a real-data setting. Also, it is interesting in 
that oi + 02 > l/\/3 and thus the classical (nonnormalized) least-squares 
estimator, corresponding to k(-) = 1, would not be consistent in this setup. 
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4. A cross-validation method for bandwidth selection and implementa- 
tion. In this section, we propose a data-driven method for selecting the 
bandwidth of the two-stage kernel-NLS estimator. 

4.1. The cross-validation bandwidth estimator. Several cross-validation 
methods in nonparametric statistics consider the distance between an ob- 
servation and a predictor of that observation given neighboring observations. 
For example, Hart (1996) used a cross-validation method based on the best 
linear predictor of Yt given the past to select the bandwidth of a kernel 
smoother, where Yt was a nonparametric function plus correlated noise. The 
methodology we propose is based on the best linear predictor of X^j^^ given 

the past, which is ao(;^) + Ej=i (;^)^t%-^Ar. 

We estimate the parameters {aj(t/N)}j using the localized two-stage 
kernel-NLS method but omit the observation X^j^ in the estimation. More 
precisely, we use a^^{b) = (oq . . . ,a~*{b)) as an estimator of {aj{t/N)}j, 
where 

(19) a;f^{b) = {n-l,{b)}-\-l,{b), 

with 



^^^^^ bN V bN J{ftt,N + Sk-i,Ny' 

kf^t,...,t+p 



f^^ bN \ bN ) {fit,N + Sk^i^N) 

kf^t,...,t+p 



By using 0.^^(6), the squared error in predicting X^j^ is given by {X^j^ — 
ao\b)-EU^j\b)Xl^^^f. 

To reduce the complexity, we suggest only evaluating the cross-validation 
criterion on a subsample of the observations. Let h be such that h oo, 
N/ /i — > oo as — > oo (in practice h^ p). We implement the cross-validation 
criterion on only the subsampled observations {Xkh,N - k = 1, . . . , N/h}. In 
other words, let a^jf]q{b) = (oq '^'^(6), . . . , a~^^{b)) be the estimator defined in 

(19) and by normalizing the squared error with the term {jj^kh^N + YTj=i -^kh-j n) 
we define the following cross-validation criterion 

,,n^ r ^ T (^kN - %'\b) - EU af\h)Xl^_^^^f 

We then use b^^^^ as the optimal bandwidth, where b^^^^ = argmin^^7v,h(^)- 
Using similar arguments to those in Hart (1996), asymptotically, one can 
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show that GN,h{b) is equivalent to the mean-squared error QN,h{b), where 

(21) 6„,.(6)-^i;e| («.,.+Ej.,^2._,.)^ /• 

It fohows that 6opt is an estimator of 6opt> where 6opt = argminb^Ar^/j(6). 
GN,h{b) is minimized if a^l^ j^{b) = a{kh/N) and in that case it is asymptot- 
icahy equal to 

Jo imu) +Y.j=i^ijiu)V } 

Therefore, b^^^ is such that a7Af(^opt) close to a{t/N). 

It is straightforward to show that the computational complexity of this 
algorithm is 0(5^ log N), where B is the cardinality of the set of band- 
widths tested for the minimum of the cross-validation criterion. We note 
that the above rate is unattainable for the kernel-QML estimator due to its 
iterative character. 



4.2. An illustrative example. We illustrate the performance of the pro- 
posed cross-validation criterion by an interesting example of a tvARCII(l) 
process for which the parameters ao(-) and ai(-) vary over time but the 
asymptotic unconditional variance ¥,{X^(u)) = aQ{u)/{l — ai{u)) remains 
constant. This means that sample paths of {Xt^N}t will invariably appear 
stationary on visual inspection, and that more sophisticated techniques are 
needed to detect the nonstationarity. 

The left-hand plot in Figure 1 shows a sample path of length 1024, sim- 
ulated from the above process using standard Gaussian errors. The true 
time-varying parameters ao(") and ai(-) are displayed as dotted lines in 
the middle and right-hand plots, respectively. In the estimation procedure, 
we used the Parzen kernel (a convolution of the rectangular and triangu- 
lar kernels) and, for simplicity, set fit,N to be the sample mean of {X^j^}t. 
To estimate a suitable bandwidth, we applied the proposed cross-validation 
procedure described above with h = 10 (empirically, we have found that for 
data of length of order 1000, the value h = 10 offers a good compromise 
between speed and accuracy of our method). We examined the value of the 
cross-validation criterion over a regular grid of bandwidths between and 
1, and obtained the optimal bandwidth as b^p^ = 0.132. 

The resulting parameter estimates are shown in the middle and right-hand 
plots of Figure 1 as solid lines. While we can clearly observe a degree of bias 
due to the small sample sizes involved in the estimation, it is reassuring to 
see that the resulting estimates correctly trace the shape of the underlying 
parameters. Denoting the empirical residuals from the fit by Zt, the p- value 
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Fig. 1. Dotted lines in the middle and right plots: the true time-varying parameters ao(it) 
and ai{u), respectively. The left plot: a sample path from the model, with Gaussian errors. 
Solid hnes in the middle and right plots: the corresponding estimates. See Section 4.3 for 
details. 

of the Kolmogorov-Smirnov test for Gaussianity of was 0.08, and the 
p- values of the Ljung-Box test for lack of serial correlation in Zt, \Zt\ and 
Zf were 0.71, 0.33 and 0.58, respectively. 

5. Constructing bootstrap pointwise confidence intervals. In parameter 
estimation of linear time series, bootstrap methods are often used to ob- 
tain a good finite sample approximation of the distribution of the param- 
eter estimators. Schemes based on estimating the residuals are often used 
[Pranke and Kreiss (1992)]. Inspired by these methods, we propose a boot- 
strap scheme for the tvARCH(p) process, which we use to construct point- 
wise confidence intervals for the two-stage kernel-NLS estimator. The main 
idea of the scheme is to use the two-stage kernel-NLS estimator to estimate 
the residuals. We construct the empirical distributions from the estimated 
residuals, sample from it and use this to construct the bootstrap tvARCII(p) 
sample. We show that the distribution of the two-stage kernel-NLS estimator 
using the bootstrap tvARCH(p) sample and the "true" tvARCH(p) estima- 
tor asymptotically coincide. We mention that the scheme and the asymptotic 
results derived here are also of independent interest and can be used to boot- 
strap stationary ARCII(p) processes [for a recent review on resampling and 
subsampling financial time series in the stationary context, see Paparoditis 
and Politis (2007)]. We emphasize that unlike the kernel-QML estimator, 
this computer-intensive procedure is feasible for the kernel-NLS estimator 
due to its rapid computability. 

Let flfo^AT = (atQ^Ar(O), . . . , dto^N{p))- We first note that Assumption l(i) is 
usually imposed in the tvARCH framework because it guarantees that al- 
most surely every realization of the resulting process is bounded. When the 
sum of the coefficients is greater than one, the corresponding process is un- 
stable. The following residual bootstrap scheme constructs the tvARCH(p) 
process from estimates of the residuals and the parameter estimators. De- 

spite Oty^AT -^Q,{uq), it is not necessarily true that the sum of the parameter 
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estimates satisfies Y^^=i (^to,N{j) < 1- To overcome this, we now define a very 
slight modification of the two-stage kernel-NLS estimator which guarantees 
that this sum is less than one. Let a^^^N = io-to,N{0), ■ ■ ■ ,o.to,N{p)), where 
ato,N{0) = ato,N{0) and, for j >1, 



(23) ato,N{j) 



V 



ato,N{j), if ^ato,N{3) < 1 - 5, 



p 



E^=iato,Af(j)' 



Since a^^^N ~^ di'^^o) ^iid Z]j=i ctj(ii) < 1 — 5 [Assumption l(i)], it is straight- 

forward to see that ^ o(^o) and J2^=i oto.A^ 0) < 1 — 5. 
T/ie residual bootstrap of the tvARCH (p) process: 

(i) li k G [to — bN, to + bN — 1] , using the parameter estimators construct 
residuals 

~ aio,7v(0) + Ej=i ato,7v(j)^fc%-,v ' 

(ii) Define = — Jlt^i^td^bN ^1 + ^ ^^^"^ consider the empirical 
distribution function 

^ to+bN-l 
k=to-bN 

where IlA(y) = 1 if y € A, otherwise. It is worth mentioning that we use Zf 
rather than Z^ since we have E{Z^) = J zFt^^Nidz) = 1. (This result is used 
in Proposition 6 in the Appendix.) 

Set Xi+^(iio) = for t < 0. For 1 < t < to + bN/2, sample from the distri- 
bution function FtQ^i\i{x), to obtain the sample {Z^'^}t- Use this to construct 
the bootstrap sample 

X+2(uo) = at\uo)Zt', a+2(uo) = at„NiO) + E at„N{j)X^.]{uo). 

We note that by estimating the residuals from [to — bN, to + bN — 1] , the 
distribution of X^'^{uo) will be suitably close to the stationary approxima- 
tion Xt{uo) when t G [to — bN/2, to + bN/2 — 1], this allows us to obtain the 
sampling properties of the bootstrap estimator. 

(iii) Define the bootstrap estimator 

(24) ^tN = {K,N}-'r.Z,N^ 
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where Xt-i{uQ)+ = (l,X^Li(no), • • .,Xl^p{uo)y and 



, , 6iV V bN 

k=p+l 



We observe that in steps (i) , (ii) of the bootstrap scheme we are constructing 
the bootstrap sample {X^''^{uo)}t whose distribution should emulate the 
distribution of the stationary approximation {Xj^(ito)}t. In step (iii) of the 
bootstrap scheme we are constructing the bootstrap estimator from 
the bootstrap samples. We note that we have bootstrapped the stationary 
approximation X^{uq) since the limiting distribution of a^^ is derived using 
the stationary approximation. 

We now show that the distributions of ^JhN {o^ ^ —Qhq^n} ^iid \/hN {g^^^N ~ 
a(no)} asymptotically coincide. 

Proposition 5. Suppose Assumption 1 holds, and suppose either 
infjaj(tio) > or lE(^f^)^^^ sup„[X]j=i aj(it)] < 1 — 5 [which implies 
sup;jE(X^^) < cxd]. Let a_tQ,N ""■^ N defined as in (23) and (24), 
respectively, and let b^VbN — >0. If \uq — to/N\ < 1/N , then we have 

^{af N -at^,N) 



to,N 

with 6 — > 0, bN — > oo as N ^ oo. 

Comparing the results in Propositions 4(ii) and Propositions 5 we see if 
b^VbN — > 0, then, asymptotically, the distributions of (a^ ^ — ato,N) ^i^d 
(^fo,Af ~ niuo)) are the same. 

6. Volatility estimation: real data examples. The datasets analyzed in 
this and the following section fall into two categories: 

1. Logged and differenced daily exchange rates between USD and a number 
of other currencies running from January 1, 1990 to December 31, 1999: 
the data are available from the US Federal Reserve website: 
www. federalreserve .gov/releases/hlO/Hist /default 1999 .htm. We 
use the following acronyms: CHF (Switzerland Franc), GBP (United 
Kingdom Pound), HKD (Hong Kong Dollar), JPY (Japan Yen), NOK 
(Norway Kroner), NZD (New Zealand Dollar), SEK (Sweden Kronor), 
TWD (Taiwan New Dollar). 
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2. Logged and differenced daily closing values of the NIKKEI, FTSE, S 
and P500 and DAX indices, measured between a date in 1996 (exact 
dates vary) and April 29, 2005: the data are available from: 
www.bossa.pl/notowania/daneatech/metastock/. 

The lengths N of each dataset vary but oscillate around 2500. In this 
section, we exhibit the estimation performance of the two-stage kernel-NLS 
estimator on the USD/GBP exchange rate and FTSE series. We examine 
the cases p = 0, 1, 2 and use the Parzen kernel with bandwidths selected by 
the cross-validation algorithm of Section 4.2. 

The left column in Figure 2 shows the results for USD/GBP. The top 
plot shows the data, the next one down shows the estimates of ao(") for 
p = Q (dashed line), p = 1 (dotted line) and p = 2 (solid line), the one below 
displays the positive parts of the estimates of ai(-) for p = 1 (dotted) and 
p = 2 (solid), and the bottom plot shows the positive part of the estimate of 
a2(-) for p = 2. Note that the negative values arise since our estimator is not 
guaranteed to be nonnegative. The right column shows the corresponding 
quantities for the FTSE data. It is interesting to observe that in both cases, 
the shapes of the estimated time- varying parameters are similar for different 
values of p. 

The goodness of fit for each choice of p = 0, 1, 2 is assessed in Table 2. In 
each case, Zt denotes the sequence of empirical residuals from the given fit. 
For the USD/GBP data, the best fit is obtained for p = 1. For the FTSE 
data, it is less clear which order gives the best fit but the Ljung-Box (L- 
B) p- value for \Zt\ is the highest for p = and thus it seems to be the 
preferred option, which is further confirmed by the visual inspection of the 
sample autocorrelation function of \Zt\ in the three cases. In both cases, the 
empirical residuals are negatively skewed, and in the case of USD/GBP they 
are also heavy-tailed. 

We conclude this section by constructing bootstrap pointwise confidence 
intervals for the estimated parameters, using the algorithm detailed in Sec- 
tion 5. Note that our central limit theorem (CLT) of Proposition 4 could be 
used for the same purpose, but this would require pre-estimation of a number 
of quantities, which we wanted to avoid. We base our bootstrap pointwise 
confidence intervals on 100 bootstrap samples. For clarity, we only display 
confidence intervals for the "preferred" orders p: that is, for p = 1 in the case 
of the USD/GBP data, and p = in the case of the FTSE series. These are 
shown in Figure 3. 

It is interesting to note that the pointwise confidence intervals for the 
"nonlinearity" parameter ai(-) in the USD/GBP series are relatively wide 
and that the parameter can be viewed as only insignificantly different from 
zero most (but not all) of the time. On the other hand, there exist time 
intervals where the parameter significantly deviates from zero. This further 
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Fig. 2. Left (right) column: USD/GBP (FTSE) series and the corresponding estimation 
results. See Section 6 for details. 



confirms the observation made earlier that the order p = is an inferior 
modehng choice for this series and that the order p = 1 is preferred. 



7. Volatility forecasting: real data examples. In this section, we describe 
a numerical study whereby the long-term volatility forecasting ability of the 
tvARCH(p) process is compared to that of the stationary GARCH(1, 1) and 
EGARCH(1, 1) processes with standard Gaussian errors. We compute the 
forecasts of the tvARCH(p) process as follows: we use the available data to 
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Table 2 

The values of bandwidth selected by cross-validation, the p-values of the L-B test for 
white noise for Zt, \Zt\, Zl , and the sample skewness and kurtosis coefficients for Zt for 
the USD/GBP and FTSE data sets. The boxed value means p-value is below 0.05 







USD/GBP 








FTSE 






p = 


p = l 


p = 2 




p = 


p = l 


p = 2 


Bandwidth 


0.02 


0.032 


0.04 




0.024 


0.028 


0.028 


L-B P-value for Zt 


0.83 


0.83 


0.82 




0.15 


0.20 


0.30 


L-B P-value for \Zt\ 


0.17 


0.71 


0.03 




0.10 


0.07 


0.07 


L-B P-value for Zf 


0.09 


0.79 


0.26 




0.13 


0.35 


0.52 


Skewness of Zt 


-0.05 


-0.09 


-0.08 




-0.13 


-0.15 


-0.16 


Kurtosis of Zt 


0.7 


0.92 


1.24 




-0.01 


0.06 


0.15 
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Fig. 3. Solid lines from left to right: estimates of ao(-) for USD/GBP, ai(-) for 
USD/GBP, and ao(-) for FTSE. Dashed lines: the corresponding 80% symmetric boot- 
strap pointwise confidence intervals. 



estimate the tvARCH(p) parameters, and then forecast into the future using 
the "last" estimated parameter values, that is, those corresponding to the 
right edge of the observed data. For a rectangular kernel with span m, this 
strategy leads to the following algorithm: (a) treat the last m data points as 
if they came from a stationary ARCH(p) process, (b) estimate the stationary 
ARCH(p) parameters on this segment (via the two-stage NLS scheme), and 
(c) forecast into the future as in the classical stationary ARCH(p) forecasting 
theory [for the latter, see, e.g., Bera and Higgins (1993)]. 

We denote the mean-square-optimal /i-step-ahead volatility forecasts at 
time t, obtained via the above algorithm, by c^j'*^^^^^''^^- Note that to 

n , . 1 .... 2,GARCH(1,1) 2,EGARCH(1,1) n 

obtam the analogous quantities, cr^f^f^ and <7^[j_,_^ ^ , for the 

stationary GARCH(1, 1) and EGARCH(1, 1) processes, we always use the 
entire available dataset, and not only the last m observations. 
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To test the forecasting ability of the various models, we use the exchange 
rate and stock index datasets listed in Section 6. For the tvARCH(p) process, 
we take p = 0,l, 2, and use the forecasting procedure described above with 
a rectangular kernel, over a grid of span values m = 50, 100, . . . , 500. Note 

1/2 

that the tvARCH(O) process has the simple form Xt^N = Oq (t/N)Zt and 
is also considered by Starica and Granger (2005). We select the span by a 
"forward validation" procedure, that is, choose the value of m that yields 
the minimum out-of-sample prediction error AMSE defined below. 

For the stationary (E)GARCH(l, 1) prediction, we use the standard S- 
Plus garch and predict routines. The stationary (E)GARCH(l, 1) param- 
eters are re-estimated for each t. 

For each t = 1000, . . . ,N — 250, we compute the quantities 

250 

— 2, model \ ^ 2, model 

'^t\t+250 ~ ^t\t+h ' 
h=l 

where "model" is one of: tvARCH(O), tvARCH(l), tvARCH(2), GARCH(1, 1), 
and EGARCH(1,1), and compare them to the "realized" volatility 

2 

h=l 

using the scaled aggregated mean square error (AMSE) 

7V~250 

pmodel _ \ ^ /_2, model ^2 ^2 

-"250,1000,Af — \'^t\t+250 ~ ^t\t+250) ' 

i=1000 

where the scaling is by the factor of 1/(A^ — 1000). For a justification of this 
simulation setup, see Starica (2003). 

Table 3 lists the AMSEs attained by tvARCH(O), tvARCH(l), tvARCH(2), 
stationary GARCH(1, 1) and stationary EGARCH(1, 1) processes: the best 
results are boxed. The values in brackets indicate the selected span values. 
The bullets for the USD/TWD and USD/HKD series indicate that the nu- 
merical optimizers performing the QML estimation in stationary (E)GARCH(l, 1) 
processes failed to converge at several points of the series and, therefore, we 
were unable to obtain accurate forecasts. We list below some interesting 
conclusions from this study. 

• In most cases, the selected span values m are similar across orders p. 
These values can be taken as an indication of how "variable" the time- 
varying parameters are. Exceptions to this rule occur mostly in data sets 
which are difficult to model, such as the HKD series, which is extremely 
spiky. For the latter series, more thought is needed on how to model it 
accurately in the tvARCH(p) (or indeed any other) framework. 
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Table 3 

AMSE for long-term forecasts using tvARCH(O), tvARCH(l), tvARCH(2), stationary 
GARCH(1,1) and stationary EGARCH(1, 1) processes. -R250 1000 ' ^ better 

J, , r, pGARCH(l,l) J pEGARCH(i,l) 

result out oj: -11250, 1000, iv -"'250, 1000, jv 



Series 


Scaling 


CHF 


10** 


GBP 


10« 


HKD 


10^2 


JPY 


10* 


NOK 


10* 


NZD 


10* 


SEK 


10« 


TWD 


10* 


S & P500 


10^ 


FTSE 


10*^ 


DAX 


10' 


NIKKEI 


10^ 



r: 



(E)GARCH(1,1) 
250, 1000, JV 



tvARCH(O) 
250, 1000, JV 



R 



tvARCH{l) 
250, 1000, JV 



R. 



tvARCH(2) 
250, 1000, JV 



2395 
20282 



8687 



1767 
11890 
37720 



33 



516 



2602 



2364 



2371 (500) 
(250) 



7660 



230 (150) 
9713 (350) 
(500) 



1552 



5270 (50) 
(250) 
(500) 



6639 



2323 



43 (500) 
860 (500) 
4492 (150) 
3418 (100) 



2254 



(500) 
9567 (300) 
170 (500) 
9173 (300) 
1875 (250) 
4976 (100) 
6805 (250) 
2372 (500) 
43 (500) 
958 (500) 
4483 (500) 
3252 (250) 



3030 (500) 
9230 (300) 
(100) 



150 



9450 (300) 
2221 (500) 
(150) 



4955 



7321 (250) 
2400 (500) 
40 (500) 
983 (500) 
4864 (150) 
3432 (250) 



• For the NZD series, it can clearly be seen how "adding more nonlinearity 
takes away nonstationarity" : as p increases, a larger and larger span m 
is selected, which means that more and more variability in the volatility 
of the data can be attributed to the nonlinearity, rather than the nonsta- 
tionarity. 

• While the tvARCH(p) framework seems superior to stationary (E)GARCH(l, 1) 
methodology for the currency exchange data, the opposite is true for the 
stock indices. This might be indicative of the fact that stock indices are 
"less nonstationary" than currency exchange series. 

We conclude with a heuristic investigation of the quality of our volatility 
forecasts. Conditioning on the information available up to time t, the quan- 
tity o'^|'4_|„25o predicts the variance of the variable := J2l%Xt+h- By 

CLT-type arguments, X^'^^^^ is approximately Gaussian, and thus we assess 
the quality of the predicted volatility by measuring how often the process 
Yt := -'^P^'^V{^f|'t4.25o J'^''^ desired confidence intervals for standard 

Gaussian variables. 

However, this is less informative of the quality of the forecasting procedure 
than one might hope, the reason being that the process Yt is strongly depen- 
dent, so it is not reasonable to expect it to take values outside (1 — a)100% 
confidence intervals exactly, or approximately, 100a% of the time. Figure 
4 shows processes Yt constructed for the GBP, NZD and SEK series, with 
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the "optimal" forecasting parameters from Table 3 (i.e., those for which the 
results are boxed). For a = 0.05, the coverages are, respectively, 100%, 79% 
and 95%. If the dependence in Yt were weaker, we would expect the three 
coverages to be closer to 95%, provided the forecasting procedure was "ad- 
equate." However, here, the strong dependence in Yt causes the variance of 
the coverage percentages to be high. 

Nonetheless, it is reassuring to note that on average, across the datasets, 
we do obtain the correct coverage of around 95%. To see this, let us consider 
the series for which our forecasting procedure is satisfactory [i.e., those for 
which it outperforms (E)GARCH(l, 1) processes], bar the two series: HKD 
and TWD, which are extremely spiky and thus difficult to model and fore- 
cast. These are: CHF, GBP, NOK, NZD, SEK. Table 4 shows the coverages 
for the five series. The average coverage is 94.2%, which is very close to 
the ideal coverage of 95%. Averaging across all series, excluding HKD and 
TWD, we obtain a coverage of 95.7%. 

APPENDIX: AUXILIARY LEMMAS AND OUTLINE OF PROOFS 

The aim of this Appendix is to sketch the proofs of the results stated in 
the previous sections. The full details can be found in a technical report, 
available from the authors or from http://www.maths.bris.ac.uk/ mapzf/ 
tvarch / trNLS .pdf. 

Before proving these results, we first obtain some results related to weighted 
sums of tvARCH(p) processes that we use below. 

In what follows, we use K to denote a generic finite positive constant. 

A.l. Properties of tvARCH(p) processes. Let us define the following 
quantity: 

(25) e{4I(^"}. 



Lemma A.l. Suppose the conditions in Proposition 3(i) are satisfied, 
let n{u) = E,{Xt{u)}, and letAt{u), Vt{u) andr{u) be defined as in (8) and 
(25), respectively. If \uq — to/N\ < 1/N, then we have: 



Table 4 

Coverage of 95% Gaussian prediction intervals for our method, using parameter 
configurations that gave the best results in Table 3 



Series 


CHF 


GBP 


NOK 


NZD 


SEK 


Coverage 


99% 


100% 


98% 


79% 


95% 
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(ii) 

N 



uq)]. 



k=p+l 

(iii) 



(iv) Suppose further that the conditions in Proposition 3(ii) are satisfied, 
then we have 

OTi/i 6^0, — > oo as N ^ oo, where W2 = /^(/2 W^{x)dx. 

Proof. It is straightforward to derive (i), (ii), (iii) and (iv) using Lemma 
A. 5 in Dahlhaus and Subba Rao (2006). We omit the details. □ 



To prove Lemma A. 3 below we use the following lemma, whose proof is 
based on mixingale arguments. Suppose 1 < g < oo, and let || • ||g denote the 
£g-norm of a vector. 

Lemma A. 2. Suppose {<j)k- A; = 1,2,...} is a stochastic process which 
satisfies ^{(pk) = and IE(0^) < co for some 1 < q < 2. Further, let Tt = 
cr{(j)t, (pt-i, ■ ■ ■); <ind suppose that there exists a p£ {0, 1) such that {E\\E((pi:\ 
Tk-jWqV^'^ < Kp^- Then we have 

(30) \ E 



k=l 




In Lemma A. 3 below we derive rates of convergence for local sums of a 
stationary ARCII(p) process. We use this result to prove the long memory 
result in Proposition 2. 

Let us define the following quantities: 

fiiiu,d,h)=K{X?{u)Xlh{u + d)}, 

(31) 

c{u,d, h) =coY{X^ (u), X^^f^iu + d)}, 
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and set /xi (n, 0,h) = fii{u,h) and c(n, 0,h) = c{u, h) . Define also the following 
quantities: 

{k+l)bN-l 

I 

(32) Sk,bN{u) 



1 

bN 



(33) 



Sk,hN{u,h,d) 



1 

m 



s=kbN 
{k+l)bN-l 

E Xl{u)Xl^^{u + d). 

s=kbN 



Lemma A. 3. Suppose {Xt{u)}t is a stationary ARCH(p) process defined 
as in (2) and suppose the conditions on the parameters {aj{u)}j and the 
innovations {Zt} in Assumption l(i), (ii), (iv) hold. Let fi{u) = 'E{Xf{u)}, 
and let /ii(n, d, h), Sk^bwiu) ind Sk,bN{u, h, d) be defined as in (31), (32) and 
(33) respectively. Then, we have 

l+r?N l/{l+v) 



E 



(34) 



N 

E 



1 

bN' 



W 



t - k 
bN 



{Xi{u)-fiiu)} 



k=p+l 

<K{bN)-^''^^/^+i. 

Further, i/{E(|Zt|2(2+C))}i/(2+C) gup^ ^p^^ aj{u) < I- 6 for some < C < 
2 and 5 > 0, then we have 

(35) mSkM^^h,d)- ^t,{u,d,h)\\\Xf^l}'/^'+^^^^ 
where the constant K is independent of u and d. 

Proof. We will first prove (34). We use Lemma A. 2, with = VF(^), 
= {X|(ti) — /i(ti)} and q=l + ri. It can be shown that 

{||E(12(^)|^,_^.)_^(^)||i+;;}i/{i+.) 
<i^p^-(i + {E||^fc_,(^)||}::;}Va+'?)), 

where Ft = a{X^{u),X^_^{u), . . .). By using the above and that the support 
of W{^^) is proportional to bN, we can apply Lemma A. 2 to obtain 



E 



N 



1 



y —w 

,.bN 

k=p+l 

1 K 
- 6iVl-p 



t-k 
bN 

/ N 

E 

\k=p+l 



i+'n 1/(1+'?) 



W 



t - k 
bN 



l/(l+r;) 



<K{bN)-'^l^^+'^\ 



Thus, we have proved (34). The proof of (35) is similar to the proof above 
but requires the additional (stated) assumption, hence we omit the details. 
□ 
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A.2. The covariance structure and the long memory effect of tvARCH(p) 
processes. In this section, we prove results for the covariance structure and 
the long memory effect of tvARCH(p) processes. 

Proof of Proposition 1. It follows easily by making a time- varying 
Volterra series expansion of the tvARCII(p) process [see Section 5 in Dahlhaus 
and Subba Rao (2006)] and using Lemma 2.1 in Giraitis, Kokoszka and Lei- 
pus (2000). We omit the details. □ 

The following lemma is used to prove Proposition 2. 

Lemma A. 4. Suppose {Xt^N}t is atvARCH(p) which satisfies Assump- 
tion l(i), (ii), (iv), and let {Xt{u)}t be defined as in (2). Let h:=h{N) he 
such that h/N — > d G [0, 1) as N ^ oo. Then we have 

1 N—h /■1—d 

(36) Kn ^ I lE{x2(n)} du. 

Further, i/{E(|Zj|2(2+C))}i/(2+C) sup„Ej=i aj{u) < 1-6 for some < C < 
2 and 6 > 0, then we have 

(37) E KNX^+h,N ^ I E{X?{u)Xl^,{^ + d)} du. 

Proof. We first prove (37). Let b := b{N) be such that 1/b is an inte- 
ger, 6 — > and b{N — /i) — > oo as — > oo. We partition the left-hand side of 
(37) into 1/b blocks. Let kb = kb[l — d], and replace the terms -^^^(jv-/i)-i-r Af 
and with X2^(jv_/,)+r(^6) ^nd A2^(^_;,)+,+^(/c6 + d), respec- 

tively. Let N' = {N - h). Use [kbN' , {k + l)bN') then we replace X^^j^ with 

Xg{kb) and X'^_^i^j^ with Xg{kb + d). Recall the notation Sk,bN{u,h,d) and 
IJ,i{u,d,h) given in (31) and (33), respectively. Now, by using Lemma 1 and 
that < # < MllMzM, we have 

^ N-h 

JfZTh ^ ^s,NXs+h,N 
s=l 

1/6-1 ^ b(N-h)-l 



(38) 



^ E fc(Ar-/t) E Sk,bN{kb,h,d) + Rn, 



k=0 ^ ^ ■r=0 



where 



1/6-1 6Af'-l . , ^ 

\RN\<b ^ — ^ \xlb^,j^^j^y^—^VkbN>+T+h,N 
k=Q " r=0 ^ ^ 
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+ 26 + 



N 



WkbN 



'+r+h 



Now, taking expectations of the above, we have 



(39) 



d- 



h 
N 



^ 1 



\ 



Therefore, Rn — > as — > oo. By substituting the integral Jq ^ d, h) du 
with a sum and using (38) and (39), we have 

N-h 



1 



N-h 



N 



s=l 



fii{u, d, h) du 



l+C/2>j l/(l+C/2) 

1+C/2J 



1/6-1 



<h {E||5fc,fe(^_;,)(A;,,/i,d)-/ii(A;,,d,/i)||;+g^}i/(i+'^/2) 



(40) 



A:=0 



+ 



1/6-1 

b Y IJ'i{kb,d,h) 

k=0 



IJ-i{kb, d, h) du 



+ OUb + kb 



d- 



h 
N 



Finally, by substituting the bound (35) and 

b ^ fii{kb,d,h) ■ 
into (40), we have 

N-h 



E{Xh^)XlH{^ + d)}du 



< kb 



E 



1 



N-h 



2 

s+h,N 



s=l 



1-d 



E{XUu)Xl^{u + d)}du 



l+C/2>i l/(l+C/2) 



l+C/2- 



which gives us (37). The proof of (36) is similar and we omit the details. □ 

Proof of Proposition 2. We first consider the more general case 
where h := h{N) is such that h/N — > (i G [0, 1) as — > 00. Then, for fixed 
h> 0, we obtain (5) as special case with d=0. 
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Let S]\f{h) = An — B^, where 

^ N -h ^ ^t,NXt+h,N and Bn = {Xn^- 
t=i 

We consider the asymptotic behavior of the terms An and Bn separately. 
By using (36) and (37), we have 

^ / iJ.i{u,h,d) du and Bn ^ / / ii{u) ^{v) du dv . 
Jo Jo Jo 

Recall that fj.{u) = K{X^{u)}, and that fii{u,d,h) and c{u,d,h) are defined 
in (31). By using the formula fJ,i{u, d, h) = c{u, d, h) + fi{u)iJ,{u + d), we obtain 

(41) SN{h) ^ J {c{u,d,h) + iJ.{u)fi{u + d)} du — <^J fi{u)du 

where h/N ^ d as N ^ oo. 

Let us now consider the special case of (41) where d = 0. Then, for fixed 
h> 0, we have 

/ .J.. , / / ..2/ 



SN{h)^ / c{u,h)du+ / / fi {u)dudv— / fi{u) i^i{v) du dv , 
Jo Jo Jo Jo Jo 

as N ^ oo. This proves (5) and, hence, we have the required result. □ 

A. 3. Proofs in Section 3.2. In this section, we prove consistency and 
asymptotic normality of the weighted kernel-NLS estimator. 

Proof of Proposition 3(i). By using (27), (28) and Slutsky's theo- 
rem, we have 

^to,N = {T^to,N}~^Lto,N ^ {E[A(no)]}-V(uo). 

Therefore, to prove that n ~^ Ql{uo), we need to show that a{uQ) = 

{E[At{uo)]}-'^r{uo). By using (2) and dividing by ^(tio, ^^-1(^0)), {^|_i(wo)}fc 
satisfies the representation 

(42) ^kM ^ a^{uo)^k-i{uo) ^ ^IM 



Finally, multiplying (42) by Xk-i{uo)/ k,{uq, X^^i^n) and taking on both 
sides expectations, we obtain the desired result. 

To prove Proposition 3(ii), we use the same methodology given in the 
proof of Theorem 3 in Dahlhaus and Subba Rao (2006), hence we omit the 
details. □ 
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A. 4. Proofs in Section 3.4. In this section, we prove consistency and 
asymptotic normality of the two-stage kernel-NLS estimator. To prove these 
asymptotic properties, we need the following two lemmas. 

Lemma A. 5. Suppose {Xt^N}t is a tvARCH(p) process which satisfies 
Assumption l(i), (ii), (iv), let ijl{u) = ¥,{X^{u)} , and let jlto,N be defined as 
in (15). If |mo — tQ/N\ < 1/N, then, for 0<i,j <p, we have 

/ (Aio, " ^" ^ 



1 



k=p+l 



bN 



with 6 — > 0, bN — > CO as N ^ oo. 



Proof. To prove the result we use techniques similar to those in Bose 
and Mukherjee (2003). By using the inequality — l/y^| < 2|a; — y|{(l/a:;)[l + 
x/y]}^, for x,y>0, we bound the difference 



1^2 y2 
^k-i,N^k-j,N 



Tr2 y2 

^k~i,N^k-j,N 



(44) 



< 2Xl_i]^Xl_j j^\flto,N - lJ'{uo)\ 



1 



< 2\flto,N - /w(wo)| ( 1 + 



1 + 



fl'to,N + Sk-1,N 



fl{uo) + Sk-l,N . 
l^(^o)|\ ^k~i,N^k-j,N 



Let us now define the following quantities: 

lL,(no)X,%-(no) 



r(no)=E 



N 



A 



to,N 



{^(uo) +5fc-i(tio)}^/ ' 



y —w 

, .bN 

k=p+l 
N 



to — k \ X l_^j^Xl_-j^ 
bN 



Cto,N{uo) 



1 ixrf ^^~^ \ ■^k-i,N-^k-j,N 

^^^bN \ bN y{/i(uo) + cSfe_i,;vF 



r(no) 



Then, by using the bound (44), we have 

Ato,N - r('uo) 



< 



1 



N 

bN 

k=p+l 



w 



y2 y2 

^k-i,N^k-j,N 



to-k \ 
bN J {fi{uo)+Sk-i,Ny 



Tiuo) 
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(45) 



+ 2|A.„,.-/.Ml(l + ^) 
V \fJ'to,N\ / 



^,,bN \ bN J 



■)r2 

^k--i,N^k-j,N 



k=p+l 

< Cto,N{uo) + 2\fito,N - /u(no)| 

V 



Kuo)|\ A _}_ 
\i^to,N\J ^ 



k=p+l 



bN 



W 



tp-k 
bN 



Since ftto,N —>■ fJ-iuo), by using Slutsky's lemma we have 



lAto,Af-A'(^o)l(^l + 



\j4uo)\\ V 
lAto,Af| 



0. 



Furthermore, by using (27) we have Ct^^Ni'^o) 0. Altogether this gives 



I A 



to,N ■ 



r(no) 



V 



0, and the desired resuh follows. □ 



To show asymptotic normality, we need to define the following least- 
squares criteria: 



(46) 
(47) 
(48) 



k=p+l 
N 

E 

k=p+l 
N 

E 

k=p+l 



1 

m' 
1 



w 



to-k 



bN 



hiu,Xkiu),X_k-iiu),a) 



where ht^^^^{yo,y,a) = (/ito,A^ + Ej=i 2/j )~^{yp -ao-Ej=i andh{u,yo, 

Ml 



y,a) = {fi{u) + Ylj=iy'j) "^{vl - ao - E^=iaj1/j}^- We note that 0,4^^^^ 



argmiua £j^^^(a). Asymptotic normality of vbNV LlJ {uQ,a[uQ)) can easily 
be established by verifying the conditions of the martingale central limit 
theorem. However, the same theorem cannot be used to show the asymp- 
totic normality of VbNV CtQ{uQ^a{u())), since ^to(wo, a(«o)) is not a sum of 
martingale differences. In Lemma A. 6 below we overcome this problem by 

showing that \J bN{Cto (^0 > o (f^o ) ) — ^^^^''(tio, 0.(^^0))) 0, which allows us to 
replace Cto{uQ,a{uo)) with c[f {uo,a{uo)). 

Lemma A. 6. Suppose {Xt^N}t is a tvARCH(p) process which satisfies 
Assumption l(i), (ii), (iii), (iv). Let f^i{u) = E{X^ (u)} , and fito,N, ^to{u,(l) 
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and {u,a) be defined as in (15), (47) and (48), respectively. If\uQ — to\ < 
1/N , then we have 

(49) |Aio,iV - Kuo)\ = Op{b^ + (6iV)-^/(i+'?)), 
and 

(50) VbN[VCtoiuo,aiuo))-ycif{uo,a{uo))]=Op{l), 
with b^O, bN — > oo as N ^ oo. 

Proof of Proposition 4. (i) It is straightforward to show consistency 
using (17) and Lemma A. 5. 

(ii) Define ^^^^^(q) = C^^^^{a) - Ct^{u,a). To prove that '^Bto,Nia{uo)) = 
Op{b^), we use the same arguments given in Theorem 3 in Dahlhaus and 
Subba Rao (2006), hence we omit the details. To prove (17), we use that 
V£t,j,Ar(aig jv) = 0) ^-iid expanding V Ctf^^NiQu^^N) about a(iio), we have 

V^Ao,7v(ato,Ar - a(Mo)) 

= ^io^ (^0 ,a{uo)) + {VCto {uo ,a{uo)) - VC[f {uq , a{uo)) } 

- Vi3to,Ar(a(uo)). 

By using Lemma A. 5, we easily see that V^£to^Ar(a(iio)) 2E[^|^^ (uq)] and, 
using (50), we have 

= -^{v4^^(^zo,a(no)) + v4„^(a(no))}{E[^('^)(t.o)]}-' + Op(^^). 

Finally, by using the martingale central limit theorem [see, e.g.. Hall and 
Heyde (1980), Theorem 3.2], we obtain (17). □ 

A. 5. Proofs in Section 5. In this section, we prove the results in Sec- 
tion 5. Some of the results in this section have been inspired by correspond- 
ing results in the residual bootstrap for linear processes literature [cf. Franke 
and Kreiss (1992)]. However, the proofs are technically very different, be- 
cause the tvARCH(p) process is a nonlinear, nonstationary process, and the 
normalization of the two-stage kernel-NLS estimator with random weights. 

In order to show that the distribution of the bootstrap sample ^ — a.to,N 
asymptotically coincides with the asymptotic distribution of a^^,N ~ o(^o)i 
we will show convergence of the distributions under the Mallows distance. 
The Mallows distance between the distribution H and G is defined as 

d2{H, G) = ^^inf ^^{E(X - y)2}V2. 
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Roughly speaking, if d2{Fn,Gn) — > 0, then the Hmiting distributions of F„ 
and Gn are the same [Bickel and Freedman (1981)]. Following Franke and 
Kreiss (1992), to reduce notation, we let d2{X,Y) = d2{H,G), where the 
random variables X and Y have measures H and G, respectively. 
We also require the following definitions. Let 



Proposition 6. Suppose Assumption 1 holds, and suppose either 
infjaj(tio) > or E(Zj^)^/^ sup„Ej=i ai(''^)] < 1 — 5 [which implies 
sup;jE(X^^) < oo]. Let F he the distribution function of Z^. Then we have 

(51) d2{Ft,,N,F)^0. 
Furthermore, if we suppose b^VbN — > 0, then we have 

(52) d2{Vm{rl^^ -nt^^^at^^N),Vm{r^{uo) -n^ 

and 

(53) nl^^E{Al^\uo)}, 
with b^O, bN — > oo as N ^ oo. 



We prove each part of the proposition below. 



Proof of (51). To prove this result, we define the empirical distribu- 
tion function of the true residuals, that is, 

^ to+bN-l 

^to,N{x) = —. \-oo,x]{Z'^), 
^"^^ k=to-bN 

noting that Z^ is an estimator of Z^ . [It is worth pointing out that in a differ- 
ent context, the empirical distribution of the estimated residuals of a station- 
ary ARCH(p) process was considered in Horvath, Kokoszka and Teyssiere 
(2001).] We first observe that since d2 is a distance it satisfies the triangle 
inequality d2{Fto^N,F) < d2(-Fio,Af, ^io,jv) + d2{Fto,N,F). By using Lemma 
8.4 in Bickel and Freedman (1981), it can be shown that d2{FtQ^N-,F) 0. 
Therefore, to prove (51), we need only show that d2{FtQ^N^ Fto ^v) ~^ 0- 
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By definition of ^2 and the measures Fiq^n and Ft^^^N, we have 



d2iFt 



Ff„,_Nr= _ inf E(Z+2-Z, 



2\2 
t ) ) 



where the infinimum is taken over all joint distributions on {Z^'^, Z^) which 
have marginals Ft^^N and Fiq^n- Let us suppose P{J = i) = {i + bN)/2bN, 
for i G {-bN, . . . , 6iV - 1}, and define Z^^ = Zj and Zf = Zj. Then, since 
{Zj,Zj) both have marginals Ft^^N and Ft^^N, respectively, we have 



d2{Ft,,N,Ft,,Nf <E{Zj - Zjf 



to+bN-l 



2bN 



< 



k=to-bN 
1 



2bN 



k=to-bN \ 



2bN 



to+bN N 



k=to-bN 



By adding and subtracting ^ J2k=t'o-bN and using that ^ Ek=t'o-bNiZl 
1) — > 0, we have 



d2{Fto^N,FtQ^,^)'^ 



to+bN-l 



< 



2bN 



E 



1 



to+bN-l 



k=to-bN 



+ 



2bN 
1 



' zl) 



2bN 



E (z, 

k=to-bN 
to+bN-l 

E (^1-1 



K 



to+bN-l 



K 



k=to-bN 
to+bN-l 



k=to-bN 



- bN ^ 



to+bN-l 

E 

k=to-bN 



Zl 



k=to-bN 

ao(fc/iV)-at„,7v(0) 



1 2 



+ Eh-(W-«to,^(j)]^L,,jv 



+ Op(l) 



where = ato,Ar(0) + J2^j=i dto,Nij)Xl_j j^. Now by bounding the above 
in two different ways we obtain 

d2iFto,N,Ftf^ 



< mill < 
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^ \a,{k/N)-atMj)\ \ " (J_ 



^ 1;;. 2^ ^k^k~j,N 



+ Op(l). 



To show (51), we need to use the bounds above noting that the bound we 
use depends on the conditions we have placed on the parameters By 
using \ajiu) — aj{v)\ < K\u — and E [tg — + ~ l]i have 

< + |a(Mo) - Oio^Tvl. 
Since la^^ ^ — ^(^^o)| 0, by using the above, it is straightforward to show 



if infj (li) > 0, 




i^k-j,N I ^ 

if supfcjvE(^fc,7v) <oo> 



with 6^0, 6A^ — > CO as N ^ oo. Therefore, under the stated assump- 
tions, and by using the above convergence in probabihty, we have that 

d2{Fto,N, jv) ^ 0- Altogether this means that d2{Ftg^N, F) 0, with b 
0, bN — > oo as — > oo, thus we obtain the result. □ 

It follows from the above [Bickel and Freedman (1981), Lemma 8.3] that 
E(Z+2)^E(Zt), E{Z+^)^E{Zf) and 

(54) 

where the infinimum is taken over all joint distributions on {Z^~'^, Zf) which 
have marginals F^^^n and -Fto,A^- ^^^^ these limits to prove the results 
below. 

To prove Proposition 6 we require the following definitions: 

A+2(no) = at\uo)Z+^, a+^{uo) = ao{uo) + a,{uo)X+]{uo), 
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and Lemma A. 7 below. We note that Xj^^(uo) is very similar to X^'^{uq), but 
the estimated parameters a^^^^ have been replaced by the true parameters 
a{uo). 

In the lemma below we show that for t G [to — bN/2,tQ + bN/2 — 1], the 
distributions of X^'^{uq) and (uq) are sufficiently close and the difference 
is uniformly bounded over t. 

Lemma A. 7. Suppose assumptions in Proposition 6 hold, then we have 

oo 

(55) E|X+2(no) - Xt\uo)\ < C\at,^^ - a{uo)\ ^ k\l - 6)'' ^ 0, 

k=l 

where 6^0, bN oo as N ^ oo and where the expectation is conditioned 
on {Xfc^Tv}- Furthermore for to + bN/2 <t<tQ + bN/2 we have 

infE|X+2(no)-X2(no)| 

oo 

< + (E|Z+2|)^+^^/(2rt)(i _ s)k 

oo 

+ Cinf E|Z+2 - Zl\ ^{1 + E(Z+2) + . . . + [E(Z+2)]'^-i}(l - bf 

k=l 

where b ^ 0, bN oo as N ^ oo. The expectation is with respect to the 
measure on all independent pairs {{Z^'^ , Zf)}t, and the infinimum is taken 
over all joint distributions on {Z^~'^,Zf) which have marginals -Ao,Af "'"'T'd 
Pto,N, respectively. 

Proof. It can be shown that the stationary ARCII(oo) process has a 
solution which can be written in terms of a Volterra series [Giraitis, Kokoszka 
and Leipus (2000)]. Define for all j > p, aj{uQ) = and atQ,N{j) = 0. Then, 
by following Giraitis, Kokoszka and Leipus (2000), X'^'^ {uq) , X'^'^ {uq) and 
X^{uq) have the solutions 



xr{u,) 

k=Ojk<--- jo-jo 



N { k '\ k 

E E n«*o,iv(is-js+i) n^^ 

k=Ojk<---jo-jo=t^s=0 ) s=l 

N ( k ^ k 

fc=0 7fe< - io:io=i '>s=0 J s=l 



oo 



k •) k 

k=Ojk<---jo-jo=t ^s=0 ) s=l 
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respectively. We first consider 



N 

E E < 

k=Ojk<-<jo-jo=t 



ato,N{js - js+l) 




Now, by repeatedly taking differences, and using that sup„ X]j=i Oj(ii) < 

1-5, YJj=i ato,N{j) <l- 6 and \\at^^N - «(^io)||2 ^ 0, we obtain (55). 

To prove (56), we first note that expectation is taken with respect to the 
joint measure on the independent pairs {{Zt'^,Z'^)]t. Using the Volterra 
expansions above, we have 

E|X+2(no)-X2(no)| 

oo fc fc fc 

< E E n n ^^-Eu.. - n ^Lel,. 



(57) 



■Jfe<P«=0 



s=l 



s=l 



V 



k=0 l<ji 
+ Op{l). 

We see from (54), if to - bN < k < to + bN - 1 and by setting Z^"^ = Z^, we 

have E| - I ^ 0. Therefore, for ah te[to- bN/2, to + bN/2 - 1] , and t 

bN/2 <i<t,we will show that inf E| ns=i • " Ils=i ^Lx-- ■ I " 

0. This allows us to obtain a uniform rate of convergence for E|Xi+2(uo) - 
X^{uo)\ for ah to — bN/2 < k< to + bN /2 — 1. To obtain this rate, we partition 
the inner sum above into two sums, where Xli=i < bN/2 and Y^^=ijs > 
bN/2. We further note that since for ah i, l<ji< p, if i;J=i js > bN/2, then 
this implies k > bN/{2p). Altogether this gives 

(58) nX+\uo) - X?{uo)\ <I + n + Op(l), 

where 



= 1 



WZ^^s . 



= 1 



I = ao{uo)Y^ [|aj^(uo)E 
and 

oo k 

E E n«.-=M{(E|Z||)'= + (E|Z+2|)'=}. 

k>bN/{2p) l<ji,-,jk<ps=0 

We now study / and consider, in particular, the difference E| Ylg=i Z'^'^ 
0^=1 -Zj^ I - By repeatedly taking differences, we have 
k k 



E 



s=l 



s=l 



< E|Zf - Z2 |{1 + E(Zf ) + • • • + [E(Z2 )]fc-i}. 



da' 
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Substituting the above into /, taking the infinimum over all joint measures 
on {Z^'^ , Z^), and using sup„X]^'=i (ijiu) <1 — 6, we obtain 



k k 



1 s=l 



oo k 

ao{uo)Y^ Y[aj^{uo)miE 
(59) <C{infE|Z+2_z2|} 

N 

X ^{1 + E(Z+2) + . . . + [E(Z+2)]fc}(i _ 5)^- + 0^(1). 

k=l 

We note that in the above we have extended the sum beyond J2i=ijs < 
bN/2 to make the summands easier to handle. Our aim is to show that the 
right-hand side of (59) converges in probability to 0. For any e > 0, define 
Bf^ := {E|Z+2| > 1 + e}. By (54), we have P(S^) -^0 as N ^oo. Denote 
further 

:= |c{inf E|Z+2 - Z?\} f^jl + E(Z+2) + . . . + [E(Z+2)]'=}(1 _ 5)' > e|. 
For £i < 6/(1 — 6), we have 

PiA^J = P{Al\Bl^)P{B-^^) + P{Al\{B^^^r)P{{B^^^r) 

< P{B2) +p(c{mm\Z+' - Zf\}J2ik + 1)(1 + ei)'=(l - 6)' > 

< P{B2) + P{Ci inf E|Z+2 - Z^l > e) ^ 0, 

which demonstrates the convergence in probability of on the right-hand side 
of (59). 

We now consider the second term //. Since k > bN/{2p) and 
sup„X]j=i Oj(^i) < 1 — (5, it is straightforward to show 

oo 

//<ao(no) E {l + {nzt^\f){l-5f 

k>bN/(2p) 

oo 

< ao(no)(l - S)'''/^^P^ £(1 + (E|Z+2|)'=+''^/(2p))(i _ 5)^ 

k=l 

Now it is straightforward to show that II ^ with 6 — > 0, bN ^ od as 
N — > oo. Altogether we obtain (56), and the desired result follows. □ 

We note that the bounds given in Lemma A. 7 are uniform for all to — 
bN/2 < t < to + bN/2, this is required to prove (52). As a byproduct of 
Lemma A. 7, we have the following result. 
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Corollary 1. Suppose the assumptions in Lemma A. 7 hold. Then, for 
all te[to- bN/2,to + bN/2 - 1], we have 



(60) 
(61) 



E\at\uo)-at\uo)\^0, 
infE|a+2(no)-a2(no)|^0, 



where 6^0, bN oo as N ^ oo, and the expectations are defined in the 
same way as in Lemma A.7. 

Proof. By using the expressions cj^^(tio) = ato,N{0) + J2^=i 0'to,N{j) x 
X^_^Auo) and af{u) = ao{u) + Y^^=iO'ji'^)^t-ji''^) ^ taking also into ac- 



count that a 
A.7. □ 



V 



.to,N 



a(no) , the desired result follows immediately from Lemma 



In order to prove (52), we require the following inequalities. 
Let us supp ose C7^ = ao + 1] j = i « j ^ j j = l3o + Y7j=i Pj Vj '^i^h { a j } , { /3j } , 
{xj} and {uj} positive. Then, it can be shown that 



(62) 



where 



2 



ZxC^lvi 



< 



Kzl{A + B) 
Ato,Y 



2E 



A- 



Aio,Y 



+ ^ aj and B ■ 



Ato,Y 



Similarly, we have 



2 

ZxfJ^Xi 



{f^to,N + J2^j=iXj)'^ {fito,N + E%iyj) 



(63) 



<K{A + B)\z, 



+ 



Kzl{A + Bf 



2EI 



We use these inequalities to prove the following result. 

Proof of (52). By definition of Mallows metric, independence of the 
pairs {{Z^'^,Z'^)]t, and that E(Z^+2) = 1, we have 



ci2{\/WV(r+_^ - 7^+^^aiQ_^^), \/WV(f ^(uo) - 7^Af (u^ 
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1 2 



(64) 



where the infinimum is taken over all joint measures on {Z^'^ , Zf). We now 
consider 



E 



<2(/ + //), 

where 

^ ^ - l)at\u,)Xtl,{u,) {Zt' - irat\u,)Xtl,{u,W 

II e( ^^'' ~ ^)^t^^^)Xt'^{n,) {Zl - iyal{u,)XU{u,) Y 

Studying first /, and using (63), we have 
/ < KE(Z+2 - if 

cxf (no)X+!,(no) af{uo)Xy^,{uo) 



X E 



< 



X |2X:]E|X+_2,.(no) -X,tV^o)| +]E|a+2(no) -a+2(no)||, 

where A, = + EU and i?i = |^ + E?=i a, (^o)- There- 

fore, by using (54), (55) and (60), we have / — > 0. Bounding // by using (64), 
we have 

//< (Ai + Bi)E|Z+2-Z||2 

^ KE{Z+'){A,+B,y 
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X |2f]E|X+!,(no)-X,%.(7.o)|+IE|a,t2(no)-^iK)||. 

Substituting the above bounds into (65), we have 
d2{\/WV(r+ - 7^^o,A^(no)+a^^,_^), V6iV(r^(uo) - 
</ + //, 

where 

" 4E(Z+^-l)^(Ai+^i)^ 
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2j2E\X+^^{u,)-X+^.{uo)\+E\a+\uo) - (no)| 
. j=i 



II 



4E(Z+^)(Ai + ^i)^ 

X |2f]infE|l+„2^.(uo) -X|_^(no)| +infE|a+2(no) -a^ 



and c^Tv = 5^ E?=i T.fJm/2 W{i^f. By using (54), (55) and (60), we have 



bN ^j=l ^k=bN/2 " y b 

/ A 0. By using (54), (56) and (61), we have 11-^0. Altogether we obtain 
the required result. □ 



Proof of (53). We use the same methods as those in the proof of 



(52) to show that d2('7^^,Ar,'7^7v('Wo)) ^ 0. Then, by using Lemma 8.3 in 
Bickel and Freedman (1981), and TlNiuo) ^E[A[''\u)] we have TZT 
K[A[^\u)], thus obtaining the desired result. □ 

We now have the necessary ingredients to prove Proposition 5. 

Proof of Proposition 5. We observe that 



Now, since by (53) we have 7^^^ ^ ElA).'^' {uq)], we can replace in the above 

TZf^ AT with E[A|^^ (tio)], and then use the delta method and (52) to get the 
required result. □ 
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